STUDENT PAPER: A Multiagent Reinforcement Learning Algorithm by Dynamically Merging Markov Decision Processes

نویسندگان

  • Mohammad Ghavamzadeh
  • Sridhar Mahadevan
چکیده

! "$#% &#' &(*),+& &. . / 0#'1/ #'2 3 / 4 & ' 1/ +!)5. +6+ 78 %9 0#'1/:; = @#'1A ! B#C# !"%DB"E1/"C#'+ ' > "% FG !+6+BH?+& I+ 7J#'1A #'1/+! " #'+O#'2 P# &"%D Q32 !. 2R & 6# 1A" !. #'1/ & /+ !SUTV W#'2 1/" 7X &78 4Y Q= =),+& ' #'1/+! " #'+O @#'1/7 / P]W & 'D +&:^H . 1A"%1/+! 7J '+6. "%"% "?F_]^` a "[L YX 4 &.[2W ' 7J ' 9 "% B#'1/ b ! 1/ XH 1/:B1AH > ! 5 ! ! 6#4c "\"%+! /> #'1/+ ^Q32 !. #'1/ b ! /+ Y #'+ +!d # &1A eFG !+6+BHf+& W+ 7J#'1/J#'1/+ " #'+g#'2 h+&: ! / <*> @#'1i 09 ! 6#P]^` ajQ32 R ! / I#'2 ? ! ! 6#'"k !. # #'+! #'2 4Sklm ?7 ' "% 6#U Qe#' @#'1iq\ 6#-aE+ /1/. (urCts & ' 1A BL #'2 &#=> "% "3v 9_ / 4 0 ' 1/ P ! HKHJ(B9 !<O1/.=<O [ ' 1/ 3#'+ [wO. 1/ 6#' @(k. + "$#% '> . #I ! /+ dX & "%+! /> #'1/+ "E#'+\#'2 +0:; [ ! / C<?> @#'1A ! B#37J '+ d A <x)M '+ <y"%+! A>J#'1/+ "3#'+*#'2 U1/ XH 1/:B1AH > ! ]^`Uas" Sklm ?1/ A /> "$#% &#' *#'2 ? wb. 1/ . (h+&)3]Wq a r tzd6(o. + <O7X 0 %9 1/ P1@#'" 78 %),+& '<K ! . 3Q31@#'2K"$# ! H 0 HKv 9{ / 4 0 ' 1/ k &7 7 A1/ 4Hb#'+U#'2 +0:; [ ! / <?> @#'1A ! ! 6#k]^` aES5lm * ! /"%+WHJ "%. '1/d8 O . +! % ' "%78+! XH 1/ 7 A ! 1/ ! / +& '1@#'2 <|#'2X &#4Ys !1A: p. + <O7 / #' KDJ +4Q3 A H u+!)3#'2 > XHJ ' @( 1/ ?"%1/ ! / ! ! 6#\]^` a " Y > "% "3HJ(J !<O1/.3<O ' 1/ k#'+* [)M9 } . 1/ 6#' @(h"%+ /:; *#'2 *<*> @#'1i & 6#P]^`UaIS5l~ b ! /"%+W1/ / /> "$#% 0#' b2 +4Q #'2 -HB( !<O1/.s<O ' 1/ \)M &<O Q-+& 'D .4 ! ?d8 = B#' H 4HU#'+ #'2 =.4 &"% Q32 P ! ! 6#'"E> "% s#' <O78+! & A @( [J#' XH H !. #'1/+ " Y d6( > "%1/ \"% <O1@9 ]W 0 'D;+0:mHJ . 1/"%1/+ g7 '+6. "%"% "^F_€ ]^` as"[L #'+W ' 7J ' "% 6#*:! & '1A &d / 9 / !#'2oHJ . 1/"%1/+ W 78+6.[2 " S

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiagent Reinforcement Learning in Stochastic Games

We adopt stochastic games as a general framework for dynamic noncooperative systems. This framework provides a way of describing the dynamic interactions of agents in terms of individuals' Markov decision processes. By studying this framework, we go beyond the common practice in the study of learning in games, which primarily focus on repeated games or extensive-form games. For stochastic games...

متن کامل

A Multiagent Reinforcement Learning algorithm to solve the Community Detection Problem

Community detection is a challenging optimization problem that consists of searching for communities that belong to a network under the assumption that the nodes of the same community share properties that enable the detection of new characteristics or functional relationships in the network. Although there are many algorithms developed for community detection, most of them are unsuitable when ...

متن کامل

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds

Perkins’ Monte Carlo exploring starts for partially observable Markov decision processes (MCES-P) integrates Monte Carlo exploring starts into a local search of policy space to offer a template for reinforcement learning that operates under partial observability of the state. In this paper, we generalize the reinforcement learning under partial observability to the self-interested multiagent se...

متن کامل

Errata Preface Recent Advances in Hierarchical Reinforcement Learning

Decision Making, Guest Edited by Xi-Ren Cao. The Publisher offers an apology for printing an incorrect version of the paper in the special issue and renders this paper as the true and correct paper. Abstract. Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent atte...

متن کامل

Multiple-Goal Reinforcement Learning with Modular Sarsa(O)

We present a new algorithm, GM-Sarsa(O), for finding approximate solutions to multiple-goal reinforcement learning problems that are modeled as composite Markov decision processes. According to our formulation different sub-goals are modeled as MDPs that are coupled by the requirement that they share actions. Existing reinforcement learning algorithms address similar problem formulations by fir...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002